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Introduction and summary 


This series is about the future of testing in America’s schools. Part one of the series—this 
report—presents a theory of action that assessments should play in schools. Part two 
reviews advancements in technology, with a focus on artificial intelligence that can power- 
fully drive learning in real time. And the third part looks at assessment designs that can 
improve large-scale standardized tests. 


Assessments are a way for stakeholders in education to understand what students 
know and can do. They can take many forms, including but not limited to paper 
and pencil or computer-adaptive formats. However, assessments do not have to be 
tests in the traditional sense at all; rather, they can be carried out through teacher 
observations of students or portfolios of students’ work. Regardless of form, 

when assessments are well designed and a component of a system of teaching and 
learning that includes high-quality instruction and materials, they are part of the 
solution and not a source of the problem. Thus, debates on whether or not to assess 
students fail to create a worthwhile discussion about testing in schools and how to 
make assessments better. 


When they are well built, standardized and nonstandardized assessments play a use- 
ful role in providing educational equity—that is, helping all students achieve at high 
levels. Accordingly, this report offers an alternative to the argument that all assess- 
ments are harmful: an idea for what role all assessments should play in education and 
the federal and state policy structure needed to make this a reality. 


Assessments—in particular, one annual standardized assessment of all public school 
students in reading and math—became the law of the land starting in 2001 with the 
renewal and renaming of the Elementary and Secondary Education Act of 1965 as 
the No Child Left Behind Act. The rationale for this policy is to promote equity in 
educational opportunity by measuring how well the public education system teaches 


students to master a state’s academic standards in these subjects. 
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Despite this laudable goal, federally required assessments are at times criticized 
because America’s students have made little progress since 2001 and their results 
correlate with race and socioeconomic status. However, the reality remains that one 
assessment alone is insufficient to solve the problem of inequity in education. That is 
because state standardized assessments look back at the end of the year and evaluate 
whether students learned the state’s academic standards in reading and math. They 
are not designed to provide information to guide teachers’ daily interactions with 
students. This type of high-quality information, as well as professional development 
in how to use student data effectively, is needed to drive learning forward. Thus, the 
state assessment must be part of a broader system of assessments that produce data 


that can evaluate, inform, and predict learning to help achieve educational equity. 


The Center for American Progress found that while some of the criticisms of assess- 
ments—the annual state standardized assessment in particular—are valid and must 
be addressed, not all of them have merit. Too often, the criticisms tend to suggest 
that all standardized testing is harmful or not useful. 


Still, improvement at the national scale is needed. The level of research and innova- 
tion required to make assessments effective can only be achieved through federal 
investment and programs. For example, federal policies should invest more in assess- 
ment development, through the pilot authorized in the Every Student Succeeds Act 
(ESSA) and through existing assessment funding programs in that law. The federal 
government can play an important role in researching assessment design and under- 
standing where and how it has a disparate impact on students. Finally, the federal 
government should use its resources to help ensure that teachers and school leaders 
become masters of local assessment development and use so that they have the data 
they need—when they need it—to guide student learning. And for their part, state 


policies should develop well-thought-out systems of assessments that are based on 


the state’s learning standards and curriculum. 
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What are standardized assessments 
and what purpose do they serve? 


Large-scale standardized assessments are just one type of assessment used in 
schools, with two main purposes: The first is used to predict student performance 
against a set of benchmarks, while the second is used to understand how many of 
the test’s benchmarks students reached at the end of the year. That is, standardized 


assessments can be designed to be predictive or evaluative. 


A standardized assessment presents test-takers with the same questions or the same 
types of questions and is administered and scored in the same way.' Designed to pro- 
vide consistent results, standardized tests allow for comparisons between students in 


a single year and over time. 


Standardized assessments play a prominent role in education in the United States; 
all public school students must take assessments in reading and math each year in 
grades three through eight as well as once in high school.” These tests measure what 


students know and can do against common state-developed, grade-level standards. 


What is an academic standard? 


An academic standard is what a student should know and be able to do in a particular 
subject.? For example, a second grade student in math should know the ones and tens 
places in double-digit numbers. 


Teachers are required to teach these standards and can use a wide variety of instructional 
materials and approaches to guide their students’ learning. Thus, while federal law re- 
quires states to have the same academic standards for all students within the state, given 
the different levels of quality of instructional materials and practices, not all students have 
the same opportunity to learn and master those standards.* 
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The other types of assessments in education, which are used to predict student per- 
formance and inform instruction, are used more frequently during the school year 

and help to guide teachers, administrators, and even parents in providing students 

with the right supports at the right time.* Importantly, not all assessments produce 
a numbered score. Some, for example, can take the form of teacher observations of 
student work and produce a descriptive assessment. 


3 technical qualities of assessments 
required by federal law 


This text box provides definitions of validity, reliability, and comparability in assessments 
and why they matter: 


* Validity refers to how accurately and fully a test measures the skills it intends to 
measure.’ For example, if an algebra test includes some geometry questions, that test is 
not a valid measure of algebra. 


* Reliability refers to the consistency of the test scores across different testing sessions, 
different editions of the test, and when different people score the exam. Reliability 
indicates how consistently the test measures the knowledge and skills it should as well 
ensures that it is not measuring error.® 


* Comparability allows for the comparison of test scores even if students took the test 
at different times, in different places, and under different conditions.? For example, test 
developers will design a test that may be administered via computer or via paper and 
pencil to account for these differences so results can be compared. 


These requirements help ensure apples-to-apples comparisons between the results of the 
test given on different days and under different conditions. 


A state assessment’s technical qualities are one tool to prioritize equity because they help 
ensure test results can answer the question, “How well are all students meeting a state’s 
college- and career-ready standards in reading and math and growing in their knowl- 
edge?” Here, “college- and career-ready” means that when students meet or exceed the 
academic standards in reading and math, they qualify to enroll in credit-bearing courses 
in college. They do not need to take remedial classes to make up for unfinished learning 
needed for credit-bearing courses. 


4 Center for American Progress | Future of Testing in Education: Effective and Equitable Assessment Systems 


History of standardized assessments 
in the United States 


Schools and standardized testing began in the first 100 years of the United States’ 
founding, and it did so against a backdrop of systemic racism and white suprema- 
cy.’° Prior to the 1860s and emancipation, enslaved people and nonwhite people 
were barred from accessing education and were often punished if it was discov- 
ered they were learning to read and write. Education during this time period was 
reserved for the white elite.'! 


It was within this context that the first uses of standardized tests in American 


education began. 


College admissions, general intelligence, and K-12 achievement tests 


Like in education in America, there is a deep history of racism within standardized 
assessments. The earliest use of assessments in America were oral qualifying exams 


for college admissions prior to 1840." 


That year marked the first uses of standardized tests in public schools. In 1845, edu- 
cator Horace Mann developed common exams for school students in Boston Public 
Schools in an attempt to understand the quality of teaching and learning." 


Manns test sparked psychologist Edward L. Thorndike’s quest for other measures 
of intelligence, believing that society would benefit from systematic sorting and 
segregation of students by academic ability. Seven states— California, Kansas, 
Massachusetts, Michigan, New Jersey, New York, and Pennsylvania—used these 
exams from 1900 to 1910. 


In 1905, commissioned by the French government, psychologist Alfred Binet 
developed an intelligence test to identify learning deficiencies, describing “slow 
children who would not profit significantly from schooling.”!> And in 1916, Stanford 


University psychologist Lewis Terman took Binet’s original test and created the 


Equitable Assessment Systems 
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Stanford-Binet Intelligence Scales to sort students by ability into college or voca- 
tional pathways. This exam is still used today." 


The U.S. Army used a multiple-choice test to measure soldiers’ mental abilities for 
the purpose of sorting, assigning, and discharging them during World War I. This 
test would become the model for future standardized assessments.’’ In 1919, Terman 
transformed the Army Alpha test into the National Intelligence Tests for school 
students, selling more than 400,000 copies in 11 months."® 


Testing to uphold white supremacy 


Literacy tests were used to disenfranchise Black men after ratification of the 15th Amend- 
ment in 1870, until enforcement of the Voting Rights Act of 1965 outlawed literacy tests 
and other methods of keeping Black people from exercising their right to vote.’ 


Throughout the beginning of the 20th century, intelligence tests were used to determine 
which immigrants were “undesireable” and should not be permitted entry into the coun- 
try.?° Federal law in 1915 required anyone who failed the test to be turned away.”" 


Testing for public school evaluation and accountability 


The use of standardized testing in schools spread nationwide after lowa developed 
tests for its high school students. In 1935, the first lowa Test of Basic Skills was 
administered to students in grades six through eight. Other states began using the 
Iowa assessment, which remained the most-used achievement test in the nation for 


SO years. 


But the role of testing in schools shifted in the 1970s after the then-U.S. commis- 
sioner of education created the first National Assessment of Educational Progress 
(NAEP) in 1969.” It sought to provide a snapshot of the progress of education in 
America, using the latest testing technology to produce sound and reliable results by 


assessing a representative sample of the nation’s students. 
The NAEP marks the modern era of standardized assessments in schools to 


evaluate learning and a shift away from measuring intelligence toward measuring 


academic standards. 
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In the 1990s, a handful of states developed statewide testing systems that used sam- 
pling methods much like the NAEP to take a representative snapshot of how well 


students perform on academic tests in various subjects.” 


Federal K-12 education laws and testing 


Simply put, today’s federal K-12 education laws ask states, in exchange for federal 
funding, to ensure that students are meeting grade-level benchmarks in reading and 
math. These benchmarks are set by experts in math and reading as well as in psycho- 
metrics, or the measurement of learning.** The benchmarks increase in complexity 
grade by grade, so when students complete their education in 12th grade, they are 
ready for the academic demands of college—whatever form that might take—or their 
chosen career path. This report and the law refer to standards like these as high or rig- 
orous standards. Federal law also asks states to evaluate schools based on these results 
and report these results publicly. 


Federal policy did not start out this way; it has evolved since the 1990s, when states 
began adopting academic standards. The federal Improving America’s Schools Act 
of 1994 asked states to apply the same standards in reading and math to all students 


and to assess their progress in learning the standards for the first time.” 


In 2001, Congress updated the Improving America’s Schools Act, renaming it as the 
No Child Left Behind Act. The updated law required states to use those test results 
to evaluate schools and identify which ones needed improvement.” States published 
those results publicly every year and gave them to parents. A 2011 federal initiative 
under the Obama administration, called ESEA Flexibility, allowed states to use 
additional criteria to evaluate schools, but most states’ criteria primarily consisted of 


standardized test scores.”’ 


The follow-up to the No Child Left Behind Act, now called the Every Student 
Succeeds Act, maintains much of the policy from ESEA Flexibility. For example, 
ESSA asks states to use both test scores and other criteria to evaluate all public 
schools. It also requires states to identify a subset of its lowest-performing schools 


for which to provide additional support to help them improve. 


‘The debate about standardized testing in schools often tends to miss that the assess- 
ment requirements in federal law serve a purpose: They are one way the law helps 
to ensure all students receive a high-quality education through the public educa- 
tion system. At its heart, ESSA is a civil rights law, providing additional resources 
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to low-income students. It also protects the quality of education by asking states to 
ensure that all children learn the knowledge and skills that will help them in college 
and their careers. Measuring student progress toward a state’s learning standards 
through an annual assessment is one way to know whether all students are on track 


or not, using a common measuring stick. 


The civil rights goals of federal K-12 education laws 


The first version of ESSA, then called the Elementary and Secondary Education 

Act (ESEA), was built on the heels of Brown v. Board of Education in 1954 and the 
Civil Rights Act of 1964, which both aimed to tackle segregation and discrimina- 
tion. ESEA intended to give students from families with low resources a chance at 
equal education.” Likewise, the Individuals with Disabilities Education Act of 1975 
(IDEA) ensured that all students with disabilities received this same opportunity.” 
Prior to the passage of IDEA, many students with disabilities were excluded from 


traditional classrooms.*° 


The first version of ESSA centered on the role that money plays in education. It 
gave additional funding to schools in under-resourced communities whose local 
property tax bases did not provide the same amount of resources as schools in 
wealthy neighborhoods. 


ESSA’s role more recently evolved to address not just education funding but also its 
quality. After years of flatlining, and even declining, results in educational outcomes, 
documented by the U.S. Department of Education report “A Nation at Risk,”*' the law 
began to play a role in the effort to improve the quality of public school education.» 
At the heart of this law—and what it requires—is an effort to include all students in 
public education and to hold them to the same high expectations. It does so by asking 


states to use their annual assessments as one measure of educational progress. 


The goals of inclusion, high standards, and educational progress are the right goals. 
And the law has been effective in getting states to raise their standards and to include 
in states’ accountability mechanisms for schools all students’ progress on those 
standards. For example, states must certify that students who meet the standards 
when they graduate high school can enroll in college without needing remedial 
coursework to catch up on missed learning. And every year, states must calculate 
how many students met grade-level benchmarks against the goal of 95 percent of 
students, or the actual number of students who took the state assessment if it is lower 
than 95 percent. 
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But incremental and disproportionate progress on test scores between student 
groups suggests the law has been less effective in ensuring educational progress. 
This is in part because education is a complex process where students learn infor- 
mation, gain experience, and make sense of it all in a way that is useful to forging 
one’s path in life.*’ The complexity of learning at any age cannot and should not 


primarily be measured by a single test. 


Furthermore, research documents that students’ basic needs must be met for them 
to be ready to learn. This is especially true for the young mind in childhood, during 
which it develops more than at any other time in life.** 


However, as the adage goes, you cannot manage what you cannot measure—meaning 
that while state assessments are not the silver bullet to improving the education sys- 
tem, they are a critical part of that process. The role that state standardized assessments 
should play in education is to improve the teaching and learning system. State test 
results at the school district level, for example, should inform what resources and sup- 
ports teachers need to improve their instruction. At the state level, results can inform 
state efforts to provide more resources to districts needing additional supports. 


Is the issue with the test, or how 
the test is administered and used? 


‘The opposition to using standardized testing in schools, in part due to the history 
of racism in the tests, is understandably not just a historical phenomenon; vestiges 
of this past remain in today’s tests. Racism in testing is something that needs to be 


unpacked and addressed fully. 


That said, some critics of standardized testing in schools miss that there are distinct 
issues to be acknowledged and then addressed—issues that reside with the test 
itself, how the test is given, and how the test results are used. These issues are too 
frequently treated as if they are one, so critics’ response is often to throw out the 
annual state test entirely. But to address these issues, and to make future assessments 
of student learning better, policymakers must understand these issues distinctly, as 
each will require different policy remedies and technical fixes. 


Criticisms of annual state standardized tests include: The tests are biased; they take 
too long to complete; students experience stereotype threat, which is an uncon- 
scious response to a negative stereotype about a certain group by a member of that 
group,*’ when taking tests; the results are not useful for teachers; the use of these 
tests narrowed the curriculum used in schools; they resulted in teaching to the test; 


and the results are used to take money away from schools. 
The authors organize these criticisms as outlined in Table 1. 


TABLE 1 
An organizing framework highlighting common criticisms of state standardized assessments 


Punishes students Punishes teachers Punishes schools 


Do not know what student 
Assessment itself Assessment is biased ; Dae a N/A 
will be tested on 


Teachers are incentivized 


How the assessment is given Assessment is too long to teach to the test; narrows Hijacks the computer lab 
curriculum 
Causes stereotype threat or Results are not useful for Schools get less money 


How the assessment results are used : ‘ 
reinforces a sense of failure teachers; teachers can get fired based on results 


Source: Created by the authors. 
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Not all criticisms of the state assessment have merit, nor do their real or perceived 
impacts affect students, teachers, and schools in the same way. To illustrate this, 
Table 1 presents these criticisms according to their real or perceived impact as well 
as by whether the impact is based on the assessment itself, how the assessment is 
administered, or how the assessment results are used. Understanding the effects and 
their source will help policymakers, administrators, and educators identify appropri- 
ate solutions for the root cause. 


This section dissects these criticisms using a fact-based review to examine the pur- 
ported impacts and deem whether they have merit and to what degree. The analysis 


cites claims where necessary. 


Common criticism: The state standardized test is biased 
Is this claim true or false? It is true, but there are multiple issues to understand. 


When it comes to assessments, the term “bias” has a specific meaning. Bias hap- 
pens when student inputs (their answers) are misinterpreted, misevaluated, and 
then scored differently. 


There are three areas of the test where bias might occur: what it is trying to measure 
(the construct), how it is trying to measure (the method), or the test question itself 
(the item): 


* Construct bias is an error in the measurement of the skill; for example, an item is 
looking to measure verbal skills but instead measures listening skills. 

* Method bias is an error in the sample of students, the test form itself (in which the 
form is confusing), or the administration (how the test forms are given to students 
and collected is confusing). 

* Item bias is where the test question itselfis ambiguous or can result in a low or high 
familiarity among certain test-takers due to cultural influence. 


Since standardized testing began, there have always been racial patterns in the 
results, suggesting bias. While this bias was initially by design, as outlined in the 
history of racism and assessments section of this report, modern assessment devel- 


opment techniques seek to eliminate bias. However, a 2010 study of the SAT con- 


firmed a particular type of bias within its assessments: item bias. 
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Bias in the news 


A study of the SAT published in 2010 found that harder test items favored Black test- 
takers and easier questions favored white.** That is, Black students taking the test more 
often answered the harder questions correctly at a higher rate than white students, who 
more often answered the easier questions correctly. 


Why would this situation be the case? 


Researchers theorize that the easier questions use more casual, everyday language that 
is part of white dominant culture. The study also suggests that the way the SAT is scored 
holds Black student scores down because the easier questions receive more points. 


Therefore, if the harder questions received more points, this bias could be addressed 
when the test is scored. This example shows that even if there is bias in any aspect of the 
test, that bias can be balanced out in the scoring process. However, this method is not a 
complete answer to rooting out bias or its impact in assessments. 


When there are consistently racial patterns in the results, bias is present somewhere 
in the test, whether in the construct, method, or item, or some combination of these. 


Bias in state standardized tests for school accountability 
Regarding state standardized tests used for federal accountability, states undergo rig- 
orous analysis to detect and remove all types of bias and must submit this evidence 


to the U.S. Department of Education for their assessment systems to be approved. 


The department runs a peer review process of each state’s assessment system, with 
experts in test development and curriculum as well as teachers and local assessment 
administrators. The data are in the form of analysis of test-taker results; it is not 
possible to analyze a test before it is given to test-takers because experts must see 
evidence of how the students interacted with the test. 


There is no question that bias exists in standardized testing, and there is no ques- 
tion that standardized testing is one tool to understand how students are doing 
against common, challenging standards of learning. Therefore, future versions 
of tests should pilot new test items on a broader range of students from differ- 

ent racial and ethnic backgrounds to minimize—or better yet eliminate—bias. 
And where bias occurs in test construct, how the skills and knowledge are being 


assessed must be better understood and addressed. 
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However, while eliminating bias is a necessary step, it is not sufficient on its own if 


educators are to take an anti-racist approach to teaching and learning. 


Assessments and cultural competence 

Solely addressing bias in assessments is an incomplete response; assessments are 
one part of a larger system of teaching and learning, which also includes standards, 
instructional materials, and instructional practice. These elements must be evalu- 


ated for bias and addressed as an entire system. 


The question of synchronizing these parts of the teaching and learning experience 
extends beyond the academic content, to the interactions between teachers and 
students, the climate of the school, and the larger sociocultural context. This entire 
process is known as cultural synchronization, and the practice of applying it to 
teaching is called culturally responsive pedagogy.*” 


Culturally responsive pedagogy is an attempt to create continuity between what 
students experience at home and what they experience at schools. Students see people 
who share their race and ethnicity included in the topics they study, for example. When 
students have access to culturally responsive pedagogy, they experience academic 
success and develop cultural competence and the ability to question the current social 
order, thereby allowing students to gain a sense of empowerment from their history 


and culture.*® 


Common criticism: It is hard for teachers to prepare students 
without knowing the test material 


Is this claim true or false? It is neither. 


Federal law places a premium on test security. Test results are used to evaluate 
school performance and must be a fair and accurate representation of student par- 
ticipation in the test. Therefore, only test developers know the exact test items. 


This does not mean that teachers do not know what kinds of test items will be 
included, however. Two groups—the Smarter Balanced Assessment Consortium 
(SBAC) and the Partnership for Assessment of Readiness for College and Careers 
(PARCC)—developed an annual statewide assessment that more than 40 states first 
used in 2014. Each group releases test items used on previous tests, which reflect 

the kinds of items included in future tests.*? Additionally, both consortia analyze 
student responses as a resource for teachers. 
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Common criticism: The tests take too long, lessening time for instruction 


Is this claim true or false? It is neither, because the answer depends on what consum- 
ers of the test results value knowing about student performance. 


The SBAC and PARCC assessments take about eight to nine hours total to com- 
plete for reading and math. This lengthy assessment is due to the law requiring 
that the test measures the full range and depth of the state’s grade-level standards 
through formats that are not just multiple choice but also constructed response, 
or written answers. These types of test questions take more time to complete than 


filling in an answer bubble. 


If policymakers, educators, and parents value knowing how well students have 
learned the full set of standards for their grade level—and not just responses to 
“yes” or “no” multiple-choice answers—then an eight- or nine-hour test will pro- 
vide that information. 


However, if policymakers, educators, and parents prefer having a high-level overview 


of whether students have learned at grade level, then a shorter exam will suffice. 


Common criticism: Students experience stereotype threat 
when taking the test 


Is this claim true or false? It is to be determined. 


Stereotype threat occurs when a member of a certain social group is at risk of 
experiencing an unconscious response to a negative stereotype about that person’s 


own group.*° 


More than 300 studies show the impacts of stereotype threat, which include limit- 
ing one’s aspirations in a field of study or career. Most of the studies are on college 
students and other adults, and none are specific to the state test. The studies include 
how participants performed on tasks as well as on tests.’ 


Like bias, this is a phenomenon that probably exists among public school students 


and is an area that needs to be studied further to know how it might be affecting 
student performance and how it can be mitigated. 


and Equitable Assess 
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Common criticism: Results are not useful for teachers to employ 
in their practice 


Is this claim true or false? It is false. 


Annual state assessments results are used to evaluate, at year’s end, whether students 
met the state’s academic standards for their grade. As a result, they are not designed 
for teachers to use in their daily practice to customize instruction for students. 


However, annual state assessments will show patterns for entire classrooms of stu- 
dents, and teachers can use this information to know generally what students did and 
did not learn and adjust their approach accordingly for the next school year’s students. 


Common criticism: State standardized tests narrowed the curriculum 


Is this claim true or false? It is true, and it happened because of how the tests were 


used to evaluate schools. 


In the period from 1987 to 2003, the amount of time devoted to different subjects 
held steady in public elementary schools: two hours to English, one hour to math, 
and a half-hour each to social studies and science. 


However, since the No Child Left Behind Act passed in 2001 and required reading 
and math tests to be included in school accountability results, 62 percent of a nation- 
ally representative sample of schools and 75 percent of schools identified as in need 
of improvement increased their time given to math and English by about half and 
decreased the time given to other subjects.” 


Common criticism: The tests result in teaching to the test 
Is this claim true or false? It is true. 


A2007 review of 49 studies found that 80 percent of the studies saw a change in cur- 


riculum and increased focus on teacher-led instruction. 
Generally, “teaching to the test” means “teaching in a manner that is not 


considered optimal for learning standard content or skills, but is believed to 


improve test performance.”** 
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Is teaching to the test always bad? Not necessarily. An assessment only covers a 
subset of the range of standards. Ifa teacher solely focuses on that subset, then stu- 
dents are missing out on other important content and skill development. But if tests 
are aligned with the depth and breadth of the standards, then teaching to the test 


some of the time can be advantageous. 


Common criticism: Test results are used to take money from 
low-income schools 


Is this claim true or false? It is false, although there are nuances to understand. 


The No Child Left Behind Act measured school performance through a construct 
called adequate yearly progress (AYP). AYP referred to the total number of students 
who achieved a score of proficient or above on the state test. Students’ scores of 
proficient or above were supposed to signify that they had achieved or exceeded the 
grade-level standards. For comparison, a grade of a C usually means proficient on an 
A-F grading scale. 


When schools missed AYP targets, districts set aside 20 percent of their Title I 
funding—which provides additional education programs in schools in low-income 
communities—to pay for supplemental education services. These services include 
tutoring as well as transportation for students to attend higher-performing schools 
that also received Title I money. 


As a result, districts did not exactly lose Title I money; however, they had to use it fora 
specific purpose when they did not meet AYP targets. Because there was less money to 
spend on services and resources for individual schools, the claim that test results lead 


to less money for schools does have some merit, but it is not the entire picture. 


The current version of the law, ESSA, takes a different approach to deploying 
resources to schools that need them. It eliminates AYP and instead requires states 
to provide additional money to some schools that are classified as low-performing, 
many of which have been poorly funded for decades, to help them improve.** 
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Common criticism: Test results are used to close schools serving Black 
and Latinx students 


Is this claim true or false? It is true, but only for a subset of schools that were closed. 


Between 2003 and 2013, approximately 2 percent of all U.S. public schools closed.*° 
Many of these were due to declining populations. Some of them, however, were due 
to poor student outcomes, as indicated in part by state assessment results. 


A 2017 study of closures found that 1,522 schools closed between 2006-07 and 
2012-13 because their state assessment scores were in the bottom 20 percent of the 
26 states included in the study. In the study, schools with higher rates of Black and 
Hispanic students were more likely to close than similarly performing schools with 
smaller shares of students who are racial and ethnic minorities.” 
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Recommendations: The role that 
assessments should play in education 


Assessments should drive excellent teaching and ensure that all students learn at 
high levels. To do so, education policy and practice must encompass a broader range 
of assessments so that schools have complete and effective assessment systems. This 
system would include predictive, informative, and evaluative assessments based 

on the state’s standards and curriculum. Such a system would be based on three 
principles: 


1. Assessments should be used only for their three intended purposes: 1) to predict 
student performance, 2) to inform instruction, or 3) to evaluate learning.** A 
complete system would include assessments that serve one of these purposes, and 


there should not be too many assessments that serve the same purpose. 


2. All assessments should align with the state’s academic standards and with high- 
quality instructional materials. This alignment allows for a tighter integration 
between what students learn in class and the items that will be included on the 
assessment.” Additionally, assessment results will send consistent signals about 
how well students are learning the standards and what they must continue to 
learn to master the full range by year’s end. 


3. Effective assessment systems use the data appropriately and for the right 
audiences. For example, teachers should use predictive assessment results to 
inform what standards students must still learn to be able to grasp the content 
of the first lesson in a unit of instruction. Predictive assessments can also shed 
light on whether students are on track to meet benchmarks on end-of-year or 
other evaluative assessments. On the other hand, district administrators and 
policymakers can use evaluative assessment results to inform what types of 
additional supports students may need to achieve the standards. Because of 
the different tools policymakers, administrators, and educators must deploy 


in the education system, all of these stakeholders must be informed on how to 


appropriately interpret assessment results. 
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Assessment audits can be an effective tool to guide states’ and districts’ understand- 
ing of what assessments they currently use and what purpose they serve, ensuring 
that students are not overassessed for evaluation purposes but also to give informa- 
tion that predicts their learning and informs instruction.*° Conducting such an audit 


can bea useful first step in building an effective and balanced assessment system. 


A case study for local assessments: Finland 


For years, Finland enjoyed top rankings on the Program for International Student 
Assessment, an international test of 15-year-olds in Organization for Economic 
Cooperation and Development member countries. Though Finland consistently 
outranks most other countries, its scores have declined—a puzzling development 


for researchers.*! 


Despite the decline, Finland credits its educational success to its heavy investment 
in teacher training as well as its model for using independent and group projects as 
ways to engage students in their learning. 


Notably, Finland takes a very different approach to assessments than the United 
States. The country eliminated its national evaluative assessment and instead allows 
teachers to design their own assessments that are based on the national curricu- 
lum.** The same training equips teachers to design school-based projects for stu- 
dents. The country only uses predictive standardized assessments for students to 
take at the end of their education. Those results are used for consideration for college 


admissions, not for evaluating education programs, students, or teachers. 
As a comparison in the United States, the quality of teacher development and train- 
ing varies widely and does not uniformly reflect the quality of the opportunities 


available in Finland or other high-achieving countries.*? 


While Finland’s example showcases the power of assessments to inform instruction, 


that is only one aspect of a complete and effective assessment system. 


How federal policies can support the effective use of assessment 
in teaching and learning 


Effective and complete assessment systems contribute to student learning. 
Currently, however, states are not required to have such systems in place—only an 
annual evaluative assessment, which is a practice that began with the 2001 No Child 
Left Behind Act. As a result, the focus on one test and its use to evaluate schools cre- 
ated bad incentives, as discussed earlier in this report. However, that does not mean 
that evaluative tests should not play a role in education; to the contrary, high-quality 
assessments aligned to a state’s standards are a critical tool in the teaching and learn- 
ing process. Federal policy should thus push states and districts to establish and 
maintain effective and balanced assessment systems. 


Accordingly, any future updates to ESSA should ask states to design a vision for how 
assessments are part of the teaching and learning process and then describe which 
assessments predict student performance, inform teaching and learning, and evalu- 


ate student learning. 


At same time, more large-scale research and development is needed in the practice 
and use of assessments. This is a great role for the federal government to play. The 


federal government should: 


* Fund the assessment pilot in ESSA and loosen some of its restrictions to support 
states in trying more innovative designs, even if those designs do not pan out. The 
assessment pilot gives states five years to try new assessment designs to replace the 


state’s annual evaluative assessment. 


Fund the development of new ways to assess students across a broad range of skills 
and not just through tests, but through other demonstrations of student learning 
as well. This can be done outside of the assessment pilot through the Competitive 
Grants for State Assessments program. 


Fund the development of predictive, informative, and evaluative assessments. 


Fund the creation of new and better ways to report assessment results, not just to 


parents, but to teachers and policymakers as well. 


Study bias in testing, including in testing construct, methods, and items. 
Reshape the teacher-focused Title II of ESSA and the Higher Education Act of 
1965 so that they promote the development of better teacher training and support 


when it comes to assessment use. 
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* Partner with institutes of higher education to identify ways that the training of 
future psychometricians—scientists who develop assessments—should change to 
ensure that tomorrow’s tests do not replicate the bias and other drawbacks seen in 
current versions. 

* Encourage states to create science, technology, engineering, and math (STEM) 
pathways that expose students to future careers as psychometric measurement 


experts to create a more diverse pool of psychometricians. 
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Conclusion 


Today’s conversation around assessments is dysfunctional and a zero-sum game. 
Assessments are neither useless nor are they the silver bullet to improving education. 
Instead, they are a vital tool to drive excellent teaching and learning. For equitable 
and effective testing to be fully realized, policymakers must invest in understanding 
the limits of today’s assessments and build on those lessons. Innovation and research 
should support the development of assessment systems that drive teaching and 


learning forward as well as evaluate student learning. 


‘The society and workforce of tomorrow will require students not only to master aca- 
demic skills but also possess a broad range of crosscutting knowledge and abilities. 
America’s current assessment system does a poor job of measuring how well students 
are prepared for that future and of guiding educators and parents to support students 
in their development. That is what policymakers and educators must address as they 


consider the future of assessments. 


To that end, future research by the Center for American Progress will highlight ways 
in which technology advances may support the measurement of a broader range of 
student knowledge and skills. 
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